1 7S> 


-A182  699 


LEVELS  OF  ANALVSIS  OF  COMPLEX  RUDITORV  STIMULI (U)  VALE 
UNIV  NEM  HAVEN  CT  DEPT  OF  PSVCHOLOQV  A  G  SAMUEL 
16  JAN  67  AFOSR-TR-87-6661  AF0SR-84-8324 


1/1 


UNCLASSIFIED 


F/G  6/4 


NL 


'*  REP 

IT 

2a  SEC 


REPORT  DOCUMENTATION  PACE 


I  1b  RESTRICTIVE  MARKINGS - 


AD- A 182  699 


3  DISTRIBUTION  /  AVAILABILITY  OF  REPORT 

Approved  for  public  release;  distribution 
unlimited. 


4  PERFORMING  ORGANIZATION  REPORT  NUMtER(S) 


5  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

AFOSR.TR.  87-  0  861 


6a.  NAME  Of  PERFORMING  ORGANIZATION 


Yal£  U/VJWf  /?s  /  TV 


6b  OFFICE  SYMBOL 
(If  tpfiiktbh) 


7a  NAME  OF  MONITORING  ORGANIZATION 

Air  Force  Office  of  Scientific  Research/NL 


6c  ADORE  SS  (Gty,  Stale,  and  ZIP  Coda) 

Kychoic(,v  be  PA^r/U£/vJ  r 

Nfcuj  e/ftufAj  Ct  Ob«TJ-o 


7b  ADORESS  (Crty,  State,  and  ZIP  Code) 

Building  410 

Bolling  AFB,  DC  20332-6448 


8a.  NAME  OF  FUNDING  /SPONSORING 
ORGANIZATION 

AFOSR 


8b  OFFICE  SYMBOL 

(If  appRcaMe) 

NL 


8c  ADORESS  (CHy,  State,  and  ZIP  Code) 


9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

OC.K  _<>l|  -  0  3  i- 


10  SOURCE  OF  FUNOING  NUMBERS 


Building  410 

Bolling  AFB,  DC  20332-6448 


1 1  TITLE  (Indude  Security  CfesvfJcation) 


PROGRAM 

PROJECT 

TASK 

ELEMENT  NO 

NO 

NO 

61102F 

%!>}  1 

A  io 

leUlELS  Cf 


12  PERSONAL  AUTHOR(S) 


AmA  LYSf 


h. 


R  TH  O  It 


CnMPi  ex- 


'biToey  Sumj/4/ 


11a  TYPE  OF  REPORT 

F  I  Aj  A-(_ 


16  SUPPLEMENTARY  NOTATION 


13b  TIME  COVERED 
FROM  /  SEP? Y  TO  ?/4u 


■KtAiJSImHHMMI  I 


5  PAGE  COUNT 


COSATI  COOES 


FIELD  I  GROUP  |  SUB-GROUP 


18  SUBJECT  TERMS  (Continue  on  reverie  <F  necessary  and  Identliy  by  block  number) 

binoij  .  f\  \J  t>  i  Toil  y  Asyy  ho  PH-Vdc  s 


S Ptfr.H  Pee c  c PTf  CA) 


19  ABSTRACT  (Continue  on  reverse  >F  necessary  and  identify  by  block  number) 


DTTC 

kELECTEI 

JUL  0  7  1987 


f*ELEC 

JUL  0  7 

f  D 


20.  DISTRIBUTION  /AVAILABILITY  OF  ABSTRACT  21  ABSTRACT  SECURITY  CLASSIFICATION 

60 UNCLASSIFIEOAJNLIMITED  B  SAME  AS  RPT  □  OTIC  USERS  UNCLASSIFIED 


22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL  |22b  TELEPHONE  (Include  Area  Code)  22c  OFFICE  SYMBOL 

hr  J"ONW  (  Tn^fir  |  (202)  767-5021  NL 


DO  FORM  1471, 84  MAR  83  APR  edition  may  be  used  until  eshausted.  SECURITY  CLASSIFICATION  OF  *HIS 

All  other  editions  are  obeolete.  UNCLASSIFIED 

87  u  i. 


ar 


AFQSR.T*.  8  7-0  861 


Arthur  C  Samuel ,  P.I. 

Department  cf  Psycholocy 
Yale  University 
Con  11A  Vale  Station 

**cw  Haven,  C7  06520  Appj  rod  for  public  release} 

distribution  unlimited. 


AIR  FORCE  OFFICE  OF  SCIENTIFIC  RESEARCH  (AFSC) 
NOTICE  OF  TRANSMITTAL  TO  DTIC 
This  technics!  report  has  been  reviewed  and  is 
approved  for  public  release  IAW  AFR  190*12. 
Distribution  is  unlimited. 

MATTHEW  J.  KERPER 

Chief,  Technics!  information  Division 


AB3SR  04-0324  Final  Report 


2 


Surma  rv 


The  two-year  project  (AFOSR  84-0324)  called  for  work  in  several  areas  of 
complex  auditory  pattern  perception.  Our  first  annual  report  summarized  research 
in  two  of  these  areas.  This  report  summarizes  our  efforts  in  four  other  areas.  The 
most  detailed  section  of  this  report  covers  work  on  the  perception  of  normal  and 


whispered  speech.  Using  the  selective  adaptation  paradigm,  this  study  examined  the 


~  X  A 

representation  of  stops) (/b/)  and  continuants, f/w/) .  The  results  supported  the 


existence  of  a  simple  acoustic,  peripheral  level  of  representation,  and  a  complex 
acoustic,  central  level  of  representation. 


Three  other  lines  of  research  are  briefly  summarized  in  this  report.  First, 
several  experiments  tested  the  putative  role  of  the  syllable  in  the  disruption  of 
perception  under  conditions  of  signal  ear-alternation*)  To  support  was  found  for 
the  syllable  playing  a  role  in  this  effect,  i'.oreover,  a  similar  effect  was  found 
for  instrumental  melodies  presented  with  ear-alternation,  suggesting  that  the 
effect  is  a  general  property  of  complex  auditory  pattern  perception. 


V - :_>The  second  brief  report  covets  work  on  timbre  perception.  A  "trumpet"  - 

"cello"  continuum  of  tokens  was  synthesized,  and  used  in  various  speech  perception 
paradigms)  The  results  for  these  nonlinguistic  stimuli  were  similar  to  those  for 
(  speech,  suggesting  common  mechanisms. 


The  final  brief  summary  renerrs  work  on  the  perceptual  restoration  of  musical 
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Introduction 


This  is  the  final  report  of  a  two-year  project  entitled  "Levels  of  Analysis 
of  Complex  Auditory  Stimuli"  (AFOSR  94-0324) .  The  project  proposal  had  a  stated 
goal  of  clarifying  the  perceptual  processing  of  complex  acoustic  stimuli, 
especially  speech  and  music,  We  believe  that  through  our  efforts  over  the  last  two 
years,  we  have  made  substantial  progress  in  achieving  this  goal. 

puantitatively,  one  can  assess  the  project  in  terms  of  the  research  effort 
proposed,  and  the  research  accomplished.  The  project  proposal  included  seven 
studies,  with  a  total  of  17  experiments  for  the  two  year  period.  We  have  collected 
and  analysed  data  for  approximately  twenty  experiments,  from  six  of  the  seven 
studies;  another  investigator  has  run  a  study  very  much  like  the  seventh  proposed 
one,  and  reported  results  in  accord  with  the  proposal's  prediction. 

Our  first  annual  report  included  a  set  of  experiments  (based  on  two  of  the 
studies  in  the  proposal)  that  were  completed  during  the  first  12-month  period.  That 
research  was  reported  at  the  November,  1985  Psychonomic  Society  meeting,  and 
appeared  in  the  October  1986  issue  of  Cognitive  Psychology.  Subsequent  research 
that  we  have  conducted  has  been,  and  will  be,  presented  at  several  professional 
meetings.  One  study  (on  the  possible  perceptual  role  of  the  syllable)  was 
presented  at  the  November  1986  Psychonomic  Society  meeting.  TVo  other  studies  (on 
music  restoration,  and  on  musical  timbre)  were  presented  at  the  December  1986 
meeting  of  the  Acoustical  Society,  c^.r  most  recently  conducted  work,  on  whispered 
speech,  will  be  presented  at  the  November  1987  meeting  of  the  Acoustical  Society, 
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and  is  being  submitted  for  publication  in  the  Journal  Of.  Experimental  Psychology: 
Human  Perception  and  Performance. 


As  should  be  clear  fran  the  papers  and  talks  generated,  we  have  collected 
data  and  found  interesting  results  in  several  domains.  To  keep  this  report 
manageable,  only  the  work  on  whispered  speech  will  be  reported  in  detail.  Brief 
summaries  of  work  on  the  role  of  the  syllable,  music  restoration,  and  timbre 
perception  will  be  included  in  this  report,  with  detailed  writeups  to  follow  (in 
the  first  progress  report  for  our  current  grant,  APOSR  86-0357). 


I.  central  and  peripheral  representation  of  whispered  and  voiced  speech 


A  basic  goal  of  research  on  speech  perception  is  to  specify  the  various  types 
of  representations  of  the  speech  signal,  beginning  with  the  vibration  pattern  on 
the  basilar  membrane,  and  culminating  in  the  understood  meaning.  Various 
intermediate  levels  of  analysis  have  been  suggested,  such  as  phones,  phonemes, 
demisyllables,  and  syllables.  The  analyses  of  possible  intermediate 
representations  focus  on  how  "encoded"  the  stimulus  is  at  any  point,  with  greater 
encoding  indicating  that  the  representation  is  further  removed  from  its  initial 
acoustic  form. 


A  growing  body  of  evidence  has  emerged  that  supports  the  existence  of  at 
least  two  distinct  levels  of  analysis  in  the  early  processing  of  speech.  In  recent 
years,  investigators  have  repeated-/  invoked  a  distinction  between  acoustic  and 
phonetic  representations  (e.g. ,  Fujisaki  and  Kawashima,  1969?  Pisoni,  1973; 

Sawusch,  1977).  This  distinction,  however,  is  by  no  means  uncontroversial:  Sane 
researchers  regard  the  acoustic  representation  as  unimportant  in  speech  processing 
(Liberman,  Isenberg,  and  Rakerd,  Ih’h' ,  while  others  would  dispense  with  the 
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phonetic  (Bailey,  1975;  Klatt,  1980a) .  Additional  controversies  center  on  the 
nature  of  these  representations.  For  example,  Samuel  and  Newport  (1979)  argued 
that  the  two-level  model  was  correct,  but  that  the  levels  are  better  thought  of  as 
simple  and  complex  acoustic  representations,  rather  than  acoustic  and  phonetic. 

Work  by  Sawusch  (1977),  and  Samuel  (1986),  among  others,  -uggests  that  the  simpler 
acoustic  representation  may  be  more  peripheral,  while  the  more  abstract 
representation  may  be  more  central.  Samuel  (1986)  has  reviewed  much  of  the 
literature  supporting  the  distinction  between  two  levels  of  analysis,  and  noted  the 
convergence  of  several  theories  in  this  regard  (e.g.,  Eimas  and  Miller,  1978; 

Samuel  and  Newport,  1979;  Sawusch,  1977,  1986;  Simon  and  Studdert-Kennedy,  1978; 
also  see  Jamieson  and  Cheesman,  1986) . 

The  present  study  is  intended  to  explore  some  of  the  properties  of  the  two 
perceptual  processing  levels  postulated  on  the  basis  of  the  research  cited  above. 
Samuel  and  Newport  (1979)  argued  that  at  these  levels,  the  overall  spectral  quality 
of  the  input  (i.e. ,  whether  it  is  primarily  noisy  or  primarily  periodic)  is  an 
important  determinant  of  its  perceptual  processing.  This  claim  was  based  on  the 
results  of  three  selective  adaptation  experiments.  The  experiments  indicated  that 
if  an  adaptor  and  test  series  differed  in  spectral  quality,  labeling  shifts  did  not 
occur.  If  spectral  quality  matched,  adaptation  effects  on  speech  continua  occurred 
even  when  the  adaptors  were  nonspeech,  and  even  if  there  was  no  spectral  overlap  of 
adaptors  and  test  items.  Further  research  by  Kat  and  Samuel  (1984)  replicated 
these  findings,  and  demonstrated  that  the  mere  presence  of  a  periodic  component 
(voicing)  does  not  matter  if  there  is  sufficient  aperiodic  energy  as  well. 

The  present  study  consists  of  two  experiments  that  focus  on  the  role  of 
periodicity,  and  on  its  representor  ion  at  peripheral  and  central  levels.  These 
experiments  are  designed  to  determine  whether  the  importance  of  a  phoneme's  overall 
spectral  quality  derives  from  its  psychological  representation,  or  directly  from 
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the  acoustics  of  the  stimulus.  The  approach  involves  using  test  continua  that  are 
either  whispered  or  produced  normally,  and  periodic  and  noisy  adaptors.  Consider, 
for  example,  a  /ba-wa/  test  series.  Samuel  and  Newport  (1979)  found  that  a 
periodic  nonspeech  adaptor  changed  labeling  of  this  series,  and  that  an  aperiodic 
one  did  not.  A  central  question  addressed  in  the  present  study  is  whether  a 
whispered  /ba-wa/  continuum  (that  is  acoustically  primarily  aperiodic)  shows  the 
same  pattern.  If  so,  it  would  indicate  that  it  is  the  long-term  psychological 
representation  of  a  speech  sound  that  matters,  rather  them  the  particular 
instantiation  of  the  moment.  If  instead  the  pattern  reverses,  with  aperiodic 
adaptors  now  producing  an  effect,  the  labeling  shifts  would  be  interpreted  as 
reflecting  the  encoding  of  the  particular  instance.  The  results  should  clarify 
whether  the  level  of  processing  that  is  being  tapped  in  demonstrations  of  the 
importance  of  spectral  quality  is  relatively  superficial,  or  if  it  reflects  more 
fundamental  properties  of  speech  perception. 


RXPF.RTMEOT  1 

Experiment  1  examined  the  effects  of  four  classes  of  adapting  sounds  on  the 
perception  of  two  classes  of  speech  sounds.  The  latter  were  (1)  a  normal ly-vo iced 
synthetic  /ba-wa/  continuum,  and  (2)  a  whispered  version  of  the  /ba-wa/  continuum, 
differing  from  the  normal  /ba-wa/  series  in  the  energy  source  (voicing  vs 
aspiration).  The  four  classes  of  adaptors  include  periodic  speech  (/ba/,  /wa/)  and 
nonspeech  (the  "pluck"  and  "ba/"  tones  used  in  previous  research) ,  and  aperiodic 
counterparts  (speech:  whispered  /ba/  and  /wa/;  nonspeech:  the  "abrupt"  and 
"gradual"  noises  used  by  Samuel  and  Newport,  1979,  and  Kat  and  Samuel,  1984. 


vX- ■ 


j CmjL 


.--V.Y.V.V-V.  .-V-  A 
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Method 

Stimuli 

The  test  items;  Two  speech  continua  were  constructed  with  the  Rlatt 
cascade/parallel  synthesizer  (Klatt,  1980b) ,  using  the  cascade  branch  of  the 
synthesizer.  An  eight-step  /ba/-/wa/  continuum  was  generated  that  varied  in  rate 
and  duration  of  the  formant  transitions.  The  voiced  series  was  energized  by  the 
normal  voicing  source  (AV) .  The  /ba/  endpoint  had  short,  rapid  formant 
transitions:  FI,  F2,  and  F3  all  reached  their  steady-state  values  in  20  ms.  The 
starting  frequency  for  FI  was  297  Hz,  for  F2  it  was  759  Hz,  and  for  F3  it  was  2028 
Hz.  The  final  values  for  these  three  formants  were  739,  1267,  and  2439  Hz.  The 
onset  value  for  AV  was  52  d3,  and  it  reached  its  full  level  of  60  dB  in  20  ms.  The 
endpoint  /wa/  differed  from  /ba/  in  the  rate  and  duration  of  the  formant 
transitions,  and  in  rise  time.  The  transitions  and  rise  time  were  55  msec  for  this 
token;  all  other  parameter  values  were  as  in  the  /ba/  stimulus.  Six  intermediate 
stimuli  were  constructed  using  linear  interpolation  of  the  transition  duration  in 
five  ms  steps. 

An  analagous  whispered  /ba/-/wa/  series  was  constructed  with  identical 
parameter  values  other  than  a  switch  in  the  energy  source.  Rather  than  exciting 
the  formants  with  the  periodic  voicing  source,  the  aperiodic  aspiration  source  (AH) 
was  used.  The  eight  whispered  tokens  were  digitally  reduced  in  anplitude  in  order 
to  match  their  perceived  loudness  to  the  voiced  stimuli. 

All  stimuli  were  300  ms,  and  all  decayed  to  silence  during  the  last  100  ms. 
The  fundamental  frequency  for  the  voiced  stimuli  was  122  Hz  for  the  first  100  ms  of 
each  stimulus,  and  dropped  linearly  over  the  last  200  ms  to  90  Hz. 
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The  adaptors.  Eight  stimuli  served  as  adaptors.  Four  speech  adaptors  were 
used,  the  /ba/  and  /wa/  endpoints  of  the  normal  and  whispered  test  series.  The 
remaining  four  adaptors  were  the  periodic  "pluck"  and  "bow",  and  the  aperiodic 
"abrupt"  noise  and  "gradual"  noise  stimuli  used  by  Samuel  and  Newport  (1979)  and 
Fat  and  Samuel  (1984).  Ihe  pluck  and  bow  are  sawtooth  wave  stimuli  with  a  440  Hz 
fundamental;  the  abrupt  and  gradual  are  white  noise  segments.  Ihe  particular 
versions  used  in  this  study  were  350  ms,  approximately  matched  to  the  duration  of 
the  speech  adaptors.  The  pluck  and  abrupt  stimuli  had  nominal  rise  times  of  0  ms 
(with  actual  times  less  than  4  ms) ;  the  bcw  and  gradual  adaptors  had  rise  times  of 
approximately  80  ms.  All  adaptors  gradually  decayed  to  silence. 

Procedure 

Subjects  participated  in  eight  one-hour  sessions,  with  sessions  separated  by 
at  least  24  hours.  Each  session  included  a  baseline  identification  test,  and  an 
adaptation  test.  On  both  tests,  subjects  heard  the  normal  and  whispered  /ba/-/wa/ 
stimuli,  and  identified  each  token  as  either  "3"  or  "17",  using  labeled  response 
buttons.  The  identification  test  consisted  of  18  randomizations  of  the  16 
syllables  (eight  normal  and  eight  whispered) .  The  first  three  randomizations  were 
practice,  and  were  not  scored.  The  adaptation  test  included  15  randomizations  of 
the  test  items,  with  an  adaptation  sequence  preceding  the  labeling  of  blocks  of 
eight  identification  stimuli,  t'ore  specifically,  subjects  initially  heard  45 
repetitions  (30  sec)  of  an  adaptor,  followed  by  eight  randomized  syllables  to 
identify,  followed  by  another  45  adaptor  repetitions,  followed  by  eight  more 
syllables,  etc.  There  was  an  additional  minute  (90  repetitions)  of  adaptation  at 
the  beginning  of  the  adaptation  test. 

Adaptors  were  presented  at  a  race  of  1.5  repetitions  per  second,  and  subjects 
v/ere  allowed  up  to  4.5  seconds  to  respond  to  each  identification  stimulus.  They 
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were  encouraged  to  respond  accurately  and  quickly.  When  all  subjects  had 
responded,  or  4.5  seconds  had  elapsed,  the  next  stimulus  was  presented  after  a  one 
second  wait. 

Subjects  were  run  in  groups  of  three;  four  such  groups  were  run.  TWo  latin 
squares  were  used  to  counterbalance  the  order  of  adaptation  conditions.  For  the 
first  four  sessions  (involving  speech  adaptors),  the  latin  square  revolved  around 
the  order  /ba/,  whispered  /wa/,  /wa/,  and  whispered  /ba/.  In  the  final  four 
sessions  (involving  nonspeech  adaptors) ,  the  latin  square  revolved  around  the  order 
pluck,  gradual,  bow,  and  abrupt. 


Subjects 


TVelve  paid  subjects  participated  in  the  eight  sessions.  All  were  native 


English  speakers  with  no  known  hearing  problems.  One  of  the  subjects  failed  to 


label  the  syllables  consistently,  and  was  not  included  in  any  of  the  analyses  to  be 


reported. 


Results  and  Discussion 


Insert  Ficure  1  About  Here 


Insert  Figure  2  About  Here 


Insert  Table  1  About  Here 


For  each  subject,  the  aver ace  percentage  of  "B"  responses  was  calculated  for 
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each  test  stimulus,  both  on  the  baseline  and  adaptation  tests.  The  group  labeling 
functions  for  the  voiced  adaptors  are  shown  in  Figure  1;  comparable  data  for  the 
whispered  adaptors  are  shown  in  Figure  2.  For  statistical  tests  of  adaptation 
effects,  a  boundary  shift  measure  was  calculated  by  subtracting  average  "B"  report 
after  adaptation  from  such  report  before  adaptation;  only  test  items  near  the 
phoneme  boundary  (items  3-6)  were  used  in  these  calculations.  Using  this  measure, 
Table  1  presents  the  mean  adaptation  shifts  for  all  of  the  conditions  of  Experiment 
1.  The  Table  also  indicates  which  shifts  were  significant  by  two-tailed  t-tests. 

rot  surprisingly,  speech  adaptors  were  all  reliably  effective  when  paired 
with  the  continua  from  which  they  were  drawn.  Three  of  the  four  cross-series 
adaptation  conditions  (whispered  adaptor  on  voiced  continuum  or  vice-versa)  also 
yielded  significant  shifts.  The  exception  was  the  non-effect  of  a  whispered  /ba/ 
adaptor  with  voiced  test  items.  This  failure  of  adaptation  probably  reflects  two 
factors.  First,  the  voiced  test  series  was  perceptually  less  labile  than  the 
whispered  continuum  -  shifts  were  generally  smaller.  Second,  there  is  apparently  a 
substantial  asyrmetry  in  the  efficacy  of  /ba/  and  /wa/  as  adaptors.  Across  voiced 
and  whispered  stimuli,  the  average  shift  of  11.6%  for  /ba/  adaptors  was  only  36%  as 
large  as  the  32.1%  average  for  /wa/.  This  asynmetry  is  evident  in  all  four 
comparisons  of  /ba/  and  /wa/  formed  by  the  crossing  of  periodicity  of  adaptor  and 
test  series. 

Insert  Figure  3  About  Here 


Insert  Figure  4  About  Here 

The  results  for  the  nonspeech  adaptation  conditions  are  shown  in  Figures  3 
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and  4,  and  sunnarized  in  Table  1.  A  central  question  addressed  in  Experiment  1 
involves  the  effect  (or  non-effect)  of  the  pluck  adaptor.  Recall  that  previous 
research  had  found  that  an  adaptor  similar  to  the  one  used  here  significantly 
reduced  /ba/  report  (Diehl,  1976;  Samuel  and  Newport,  1979).  For  the  voiced  test 
items,  the  3%  effect  found  in  Experiment  1  is  comparable  to  shifts  found  in  earlier 
research,  but  did  not  reach  significance.  Fortunately,  the  critical  prediction  of 
this  study  involved  the  effect  of  the  pluck  adaptor  on  the  whispered  /ba/-/wa/ 
syllables;  Does  the  periodic  nonspeech  sound  work  at  a  straightforward  acoustic 
level,  or  is  the  effect  on  a  more  abstract  representation?  Hie  significant 
reduction  of  /ba/  report  indicates  that  the  effect  is  on  a  more  abstract 
representation,  since  the  periodic  pluck  shares  features  with  the  underlying 
(voiced)  representation  of  the  whispered  /ba/,  but  not  with  its  aperiodic  surface 
realization.  The  larger  effect  on  the  whispered  stimuli  than  on  the  voiced  ones 
presumably  is  another  reflection  of  the  greater  lability  of  the  whispered 
syllables. 

The  results  for  the  other  nonspeech  adaptors  were,  as  expected,  small  and 
nonsignificant.  In  previous  research,  (Samuel  and  Newport,  1979) ,  the  bcw,  abrupt 
and  gradual  adaptors  had  been  ineffective  on  a  voiced  /ba-wa/  test  series;  the 
present  results  replicate  this.  The  results  for  the  whispered  stimuli  were  also 
nonsignificant,  although  there  were  trends  in  the  appropriate  direction  for  the  two 
aperiodic  adaptors.  These  trends  could  reflect  the  operation  of  low-level  acoustic 
adaptation,  given  the  shared  aperiodicity  and  onset  characteristics  of  the  adaptors 
and  whispered  test  syllables. 

In  sum,  Experiment  1  yielded  two  very  interesting  results.  First  and 
foremost,  the  efficacy  of  the  pluck  adaptor  on  the  whispered  /ba-wa/  stimuli 
demonstrates  that  the  effect  is  occur ing  at  an  abstract  level  of  representation. 
Second,  there  is  a  narked  asymmetry  in  the  effects  of  adaptation  with  /ba/  and  with 
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/wa/:  Hie  effects  for  /ba/  are  much  snaller  than  those  for  /wa/,  and  they  appear 
to  more  sensitive  to  matching  the  adaptor  and  test  series.  Experiment  2  provides  a 
test  of  the  replicability  of  the  first  point,  and  examines  whether  the  second  can 
be  traced  to  differences  in  the  central  and  peripheral  representations  of  /ba/  and 
/wa/. 

EXPERIITETTr  2 


Hie  success  of  the  periodic  nonspeech  adaptor  in  changing  the  identification 
of  whispered  /ba/ — /wa/  syllables  demonstrates  that  such  shifts  can  occur  in  the 
absence  of  substantial  spectral  similarity.  Recall  that  this  effect  was  predicted 
on  the  assumption  that  /b/  and  /w/  are  underlyingly  periodic,  and  that  it  is  at 
this  more  abstract  level  that  the  pluck  is  causing  labeling  shifts.  In  terms  of 
the  two-level  model  discussed  in  the  Introduction,  these  effects  would  be  traced  to 
the  complex  acoustic,  or  central,  level.  Hie  fact  that  the  adaptor  was 
nonlinguistic  indicates  that  this  level  is  not  speech-specific. 

A  number  of  investigators  have  compared  ipsilateral  and  contralateral 
presentation  of  adaptors  and  test  items  to  separate  peripheral  from  central  effects 
(e.a. ,  Ades,  1974;  Eimas,  Cooper,  and  Corbit,  1973;  Ganong,  1978;  Jamieson  and 
Cheesman,  1986;  Ohde,  1982;  Samuel,  1986;  Sawusch,  1977).  Hie  rationale  for  this 
manipulation  is  that  central  effects  are  assumed  to  occur  at  a  level  beyond  the 
point  at  which  information  from  the  two  ears  is  combined.  As  such,  adaptation 
effects  found  under  contralateral  testing  conditions  are  inferred  to  reflect 
central  processing.  In  contrast,  effects  under  ipsilateral  conditions  should  be 
due  to  both  central  and  peripheral  mechanisms  (assuning  the  existence  of  both).  Hie 
prediction  for  the  pluck  adaptor  in  this  paradigm  should  be  clear:  If  the  effect 
observed  in  Experiment  1  really  i s  due  to  central  mechanisms,  contralateral  testing 
should  be  equivalent  to  ipsilateral  testing.  Horeover,  because  the  adaptation  is 
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hypothesized  to  occur  at  an  abstract  level,  the  effect  should  be  no  different  for 
v/hispered  and  normal  /ba/~/wa/  test  items. 

This  same-ear  versus  different-ear  methodology  can  also  be  profitably  applied 
to  an  analysis  of  possible  differences  in  the  representation  of  voiced  stop6  and 
continuants.  The  relatively  large  shifts  found  with  /w/  adaptation  could  reflect 
effects  at  both  central  and  peripheral  levels  of  representation.  The  smaller 
shifts  found  for  the  /b/  conditions  might  be  due  to  the  absence  of  an  effect  at  one 
of  these  levels.  The  /b/-like  effect  for  the  pluck  adaptor,  just  postulated  to  be 
centrally  mediated,  suggests  that  an  adaptable  central  representation  for  /b/  is 
called  for.  On  the  other  hand,  Jamieson  and  Cheesman  (1986)  have  recently  argued 
that  the  peripheral  level  is  the  primary  locus  for  adaptation  of  voiced  stop6. 
Experiment  2  uses  the  same-ear/cross-ear  methodology,  along  with  the  nonspeech 
adaptor  and  voiced/whispered  stimulus  distinction,  to  try  to  clarify  the 
representation  of  stops  and  continuants. 

T'.ethod 

Stimuli 

The  same  two  test  series  (voiced  /ba/ — /wa/  and  whispered  /ba/ — /wa/)  that 
were  used  in  Experiment  1  were  used  in  Experiment  2.  Five  adaptors  were  used:  the 
endpoints  of  each  test  series,  and  the  nonspeech  pluck. 

Procedure 

Subjects  participated  in  ten  one-hour  sessions  of  the  same  form  used  in 
Experiment  1.  Four  groups  of  subjects  were  run.  For  two  of  these  groups,  the 
order  of  adaptors  over  the  first  five  days  was  /ba/,  /wa/,  pluck,  whispered  /ba/, 
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and  whispered  /wa/;  this  order  was  reversed  for  the  last  five  sessions.  For  one  of 
these  groups,  the  test  items  and  adaptors  were  presented  ipsilaterally  for  the 
first  five  sessions,  and  contralaterally  for  the  last  five.  For  the  second  group, 
the  five  ipsilateral  conditions  followed  the  five  contralateral  ones.  Hie 
remaining  two  groups  followed  a  similar  counterbalancing  procedure.  Their  first 
five  sessions  followed  the  adaptor  order  whispered  /wa/,  whispered  /ba/,  pluck, 
/wa/,  and  /ba/.  Test  items  v/ere  presented  to  the  left  ear  for  subjects  in  the 

first  two  groups,  and  to  the  right  ear  for  the  other  two. . 

Subjects 

Twelve  subjects  from  the  same  population  as  those  in  Experiment  1 

participated  in  Experiment  2.  One  subject's  data  were  not  included  in  the  analysis 

due  to  his  failure  to  label  the  syllables  consistently,  and  another  subject's  data 
were  lost  due  to  computer  error. 


Results  and  Discussion 


As  in  Experiment  1,  for  all  subjects  the  percentage  of  stimuli  labeled  "B" 
was  calculated  for  each  token  in  each  condition.  As  before,  the  measure  of 
adaptation  is  the  difference  in  proportion  of  stimuli  labeled  "B"  before  and  after 
adaptation  using  the  center  four  items  of  the  eight-item  continuum.  Separate 
analyses  of  variance  were  conducted  on  these  scores  for  the  pluck  adaptation 
conditions,  the  /wa/  adaptation  conditions,  and  the  /ba/  adaptation  conditions. 
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Insert  Figure  5  About  Here 


Insert  Table  2  About  Here 


Hie  results  for  the  pluck  adaptor  are  illustrated  in  Figure  5.  A  two-fac;r 
analysis  of  variance  was  conducted  on  the  pluck  data,  examining  the  effects  of 
laterality  (ipsilateral  versus  contralateral  adaptation)  and  test  continuum 
periodicity  (voiced  versus  whispered) .  Collapsing  across  these  factors,  the  grand 
mean  shift  was  7.7%,  F(l,9)  =  7.59,  p<.03.  As  Figure  5  shows,  there  were  no 
notable  effects  of  laterality  or  periodicity  (both  F<1) ;  there  was  also  no 
interaction  of  these  factors,  F(l,9)  =  1.00,  n.s.  H.us,  the  results  nicely 
replicate  those  of  Experiment  1,  and  confirm  the  abstract  nature  of  the  affected 
representation:  A  central  representation  that  is  relatively  insensitive  to  the 
acoustic  details  of  a  stimulus  (e.g. ,  its  periodicity)  would  produce  exactly  this 
pattern  of  results. 


Insert  Figure  6  About  Here 
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Insert  Figure  7  About  Here 


Hie  results  for  the  /wa/  adaptors  also  paint  a  consistent  picture.  These 
data  were  examined  with  a  three-way  analysis  of  variance,  testing  the  effects  of 
adaptor  periodicity  (voiced  /wa/  versus  whispered  /wa/) ,  continuum  periodicity 
(voiced  versus  whispered  test  items) ,  and  laterality  (ipsilateral  versus 
contralateral  adaptation) .  As  is  clear  in  Figures  6  and  7 ,  the  continuants 
produced  large  labeling  shifts;  the  mean  change  of  24.1%  was  quite  reliable,  F(l,9) 
=  141.44,  p<.001.  There  was  no  main  effect  of  periodicity,  either  of  test  series 
(F<1) ,  or  of  adaptor,  F(l,9)  =  1.32,  n.s.  There  was,  however,  a  robust  effect  of 
laterality:  Same-ear  adaptation  produced  shifts  twice  as  large  as  those  caused  by 
cross-ear  adaptation,  F(l,9)  =  58.44,  p<.001.  By  the  logic  of  the  laterality 
manipulation,  this  difference  indicates  a  roughly  equal  mix  of  central  and 
peripheral  effects  for  continuants. 

•tone  of  the  two-way  interactions  approached  significance,  with  all  F's  <  1. 
However,  the  three-way  interaction  was  reliable,  F(l,9)  =  9.27,  p<.02.  The  basis 
for  this  interaction  is  apparent  both  in  theory  and  in  Table  2:  Hatching  the 
periodicity  of  adaptor  and  test  series  (a  two-way  interaction)  makes  a  difference 
with  ipsilateral  adaptation,  but  is  irrelevant  in  the  contralateral  case.  Recall 
that  contralateral  testing  is  assumed  to  tap  central,  abstract  representations. 

Such  representations  should  be  relatively  insensitive  to  acoustic  details,  such  as 
whether  a  /wa/  was  voiced  or  whispered.  Ipsilateral  testing,  in  contrast,  is 
assumed  to  include  a  peripheral  component  that  Samuel  and  Newport  (1979)  likened  to 
a  "neural  spectrogram".  Therefore,  under  ipsilateral  conditions,  acoustically 
matching  adaptor  and  test  items  should  make  a  difference,  and  it  does. 
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Insert  Figure  8  About  Here 


Insert  Figure  9  About  Here 


Tlie  results  for  the  /ba/  adaptation  conditions  are  shown  in  Figures  8  and  9. 
Although  the  overall  shift  of  8.6%  was  quite  reliable  (F(l,9)  =  24.60,  p<.001) ,  the 
pattern  of  results  was  rather  odd  in  some  respects.  The  data  were  analyzed  with 
the  same  three  factors  used  for  the  /wa/  conditions.  As  in  all  of  the  other 
analyses,  there  was  no  effect  of  test  series  periodicity,  F<1.  There  was  a 
noticeable  but  nonsignificant  trend  for  the  whispered  adaptor  to  be  more  effective 
than  the  voiced  one,  r(l,3)  =  3.34,  p>.10.  The  most  bizarre  aspect  of  the  data  was 
the  significant  effect  of  laterality,  in  the  wrong  direction;  contralateral 
conditions  actually  yielded  bicger  shifts  than  ipsilateral,  F (1 ,9 )  =  5.76,  p<-05. 
Examination  of  Table  2  reveals  that  this  was  due  to  the  total  lack  of  an  effect  for 
both  ipsilateral  conditions  of  voiced  /ba/ ,  and  a  similarly  small  effect  for  one  of 
the  ipsilateral  whispered  /ba/  conditions. 

This  pattern  led  to  significant  interactions  of  adaptor  periodicity  with 
continuum  periodicity  (F(l,9)  =  9.68,  p<.02) ,  and  of  laterality  with  continuum 
periodicity,  F (1 ,9)  =  8.00,  p<.02.  There  was  no  three  way  interaction,  F(l,9)  = 
1.50,  n.s. 

Given  that  no  plausible  tneory  could  account  for  larger  contralateral  effects 
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than  ipsilateral  ones,  we  hypothesized  that  the  null  ipsilateral  effects  were  a 
fluke.  TO  test  this  assumption,  we  re-ran  the  voiced  /ba/  conditions  with  a  new 
group  of  twelve  subjects.  These  subjects  were  run  under  identical  conditions  to 
those  in  the  main  experiment,  but  for  only  two  sessions.  Ear  of  test  items,  and 
order  of  ip6ilateral/contralateral  conditions  we re  counterbalenced  over  four  groups 
of  subjects. 


Insert  Figure  10  About  Here 


The  results  for  the  new  subjects  are  sham  in  Figure  10,  and  are  sunmarized 
in  the  bottom  rw  of  Table  2.  As  is  clear  in  the  Figure,  the  data  are  much  more 
sensible  than  before,  confirming  the  aberrant  nature  of  the  previous  results.  A 
two- factor  analysis  of  variance  was  conducted  on  the  c'ata  from  the  new  subjects, 
examining  the  effects  of  laterality  and  test  series  periodicity.  The  overall  shift 
of  14.6%  was  reliable,  ?fl,10)  =  13.41,  p<.005.  ('either  laterality  (F<1)  nor 
continuum  periodicity  ( F ( 1 , 1 C )  =  1.77,  n. s.)  had  an  effect,  and  their  interaction 
did  not  reach  significance,  F(l,10)  =  2.62,  n.s.  Individual  tests  of  the  four 
conditions  shown  in  Figure  10  confirmed  that  all  were  significant  (smallest  F(l,10) 
=  5.78,  p  <  .04) . 


General  Discussion 


Severed  issues  have  been  addressed  in  the  two  experiments  of  the  present 
study.  These  issues  revolve  around  the  levels  of  representation  of  voiced  and 
whispered  stops  and  continuants.  .*s  a  general  level,  the  pattern  of  results 
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supports  the  simple  intuition  of  native  speakers  that  whispered  speech  is 
fundamentally  similar  to  its  voiced  counterpart:  Across  all  of  the  various 
manipulations  used  in  the  two  experiments,  the  voiced  and  whispered  stimuli  behaved 
similarly.  This  result  is  less  intuitive  than  it  might  at  first  appear  if  the 
gross  differences  in  spectral  compositon  for  voiced  and  whispered  productions  are 
considered. 

The  results  of  the  present  study  suggest  that  the  perceptual  similarity  of 
acoustically  discrepant  tokens  may  be  mediated  by  an  abstract  level  of 
representation.  Recall  that  there  is  a  converging  body  of  evidence  that  supports 
the  existence  of  two  discernable  levels  of  representation.  In  terms  of  this 
two-level  theory,  the  similarity  of  voiced  and  whispered  speech  may  be  traced  to 
the  "complex  acoustic",  or  central  level  (Samuel  and  Newport,  1979;  Sawusch,  1977). 

A  critical  piece  of  evidence  for  this  claim  is  the  significant  adaptation 
effect  of  the  pluck  on  the  whispered  /ba/-/wa/  continuum  found  in  both  experiments. 
Previous  work  (Diehl,  1976;  Samuel  and  Newport,  1979)  has  shown  that  this  nonspeech 
sound  induces  chances  in  identification  of  a  normal  (voiced)  /ba/-/wa/  series.  In 

addition,  Samuel  and  Newport  shewed  that  the  pluck  was  ineffective  on  an  aperiodic 

v  V 

/ca/-/sa/  continuum  that,  like  /ba/-/wa/,  varied  in  rise  time.  These  and  other 
results  led  to  the  conclusion  that  /b/  is  represented  as  a  periodic  sound  with  an 
abrupt  onset;  the  pluck  matches  these  properties,  and  thereby  acts  like  /b/  in  the 
adaptation  paradigm. 

Uithin  this  context,  the  efficacy  of  the  pluck  on  the  whispered  /ba/-wa/ 
series  has  three  important  irri ications.  First,  a  whispered  /b/,  though 
acoustically  aperiodic,  is  psychologically  periodic;  it  really  is  a  /b/.  Second, 
the  adapting  effect  of  the  pluck  is  occur ing  at  this  abstract  level,  rather  than  in 
the  "neural  spectrogram"  of  tine  cirpie  acoustic  level.  These  bwo  conclusions 
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follow  from  the  reliable  adaptation  effect  despite  the  acoustic  mismatch  of  the 
pluck  to  the  whispered  speech.  Finally,  the  complex  acoustic  level  being  affected 
cannot  be  speech-specific,  or  "phonetic",  as  it  has  often  been  called.  This 
follows  from  the  simple  fact  that  the  pluck  is  clearly  a  nonspeech  sound,,  yet  it 
is  affecting  (and  thus  being  processed  by)  this  level  of  representation.  These 
results  converge  with  successful  adaptation  results  that  Samuel  and  Newport 
reported  using  a  filtering  manipulation  to  preempt  acoustic  overlap  of  nonspeech 
adaptor  and  speech  syllables. 

The  results  with  the  pluck  adaptor  in  Experiment  2  provide  further  evident-, 
for  these  conclusions.  The  adaptation  effect  using  the  pluck  sound  was  unaffected 
by  either  the  periodicity  of  the  test  items,  or  by  the  laterality  manipulation.  The 
equivalence  of  contralateral  adaptation  to  ipsilateral  adaptation  provides  strong 
support  for  the  claim  that  these  effects  may  be  traced  to  a  central,  complex 
acoustic  level  of  representation. 

Insert  Table  3  About  Here 


The  pattern  of  adaptation  effects  found  with  the  /b/  and  /w/  adaptors  also 
can  be  used  to  test  the  utility  of  distinguishing  between  simple  acoustic  and 
complex  acoustic  levels  of  representation.  To  examine  this  issue,  it  is  helpful  to 
organize  the  various  conditions  of  Experiments  1  and  2  in  terms  of  three  factors: 

(1)  whether  the  adaptor  was  a  stop  (/b/)  or  a  continuant  (/w/) ;  (2)  whether  the 
adaptation  would  only  affect  central  mechanisms  (contralateral) ,  or  both  central 
and  peripheral  (binaural/ ipsilateral) ;  and  (3)  v/hether  the  adaptor  came  frcrn  the 
same  series  as  the  test  iters  'ratched  periodicity) ,  or  from  the  other  series 
(mismatched  periodicity).  Table  3  summarizes  the  results  of  the  two  experiments  in 
terms  of  these  factors. 
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Consider  first  the  pattern  of  adaptation  as  a  function  of  laterality.  In  the 
contralateral  conditions,  neither  the  type  of  adaptor  (stop  versus  continuant)  nor 
the  match/mismatch  of  adaptor  and  test  series  made  any  difference;  the  shifts  only 
range  from  13.3%  to  17.4%.  This  pattern  is  consistent  with  the  view  that 
contralateral  testing  taps  central  mechanisms  that  are  removed  from  the  acoustic 
details,  and  that  are  comparable  for  stops  and  continuants.  The  pattern  is  quite 
different  when  adaptation  is  conducted  under  either  binaural  or  ipsilateral 
monaural  conditions.  As  the  Table  shows,  these  conditions  show  big  effects  of 
adaptor  type  —  /w/  adaptors  average  32.1%  shifts  versus  only  11.8%  for  adaptation 
with  /b/.  Similarly,  within-series  effects  (26.4%)  are  noticeably  bigger  than 
betv/een-series  (17.5%).  These  differences  support  the  view  that  these  testing 
conditions  tap  an  additional  peripheral  component  that  is  sensitive  to  the 
acoustics  —  the  simple  acoustic  level.  Moreover,  the  difference  between  /b/  and 
/w/  effects  suggests  that  continuants  have  a  more  substantial  peripheral 
representation,  or  at  least  one  that  is  more  susceptible  to  adaptation.  This  point 
will  be  considered  further  shortly. 

Looking  at  Table  3  in  terms  of  the  type  of  adaptor,  rather  than  in  terms  of 
laterality,  leads  to  the  same  conclusions.  For  stops,  laterality  condition  does 
not  matter;  contralateral  effects  (14.7%)  were  actually  slightly  larger  than 
binaural/ ip6ilateral  (11.3%) .  Again,  this  suggests  that  primarily  central 
representations  were  affected,  with  little  or  no  contribution  of  peripheral 
adaptation.  The  results  for  the  continuants  clearly  contrast  with  those  for  the 
stops;  binaural/ ipsilateral  effects  (32.1%)  were  twice  as  large  as  contralateral 
ones  (16.0%).  As  noted  previously,  continuants  show  roughly  equal  effects  of 
adaptation  at  the  simple  acoustic  (peripheral)  and  complex  acoustic  (central) 
level. 

The  conclusion  that  /b/  is  : rinsr ily  centrally  represented  is  at  odds  with  a 
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recent  study  by  Jamieson  and  Cheesman  (1986).  These  investigators  also  used  the 
same-ear/different-ear  methodology,  and  found  that  voiced  stops  (on  a  voice  onset 
time  continuum)  displayed  weak  cross-ear  effects.  As  such,  they  concluded  that 
voiced  stops  were  represented  primarily  peripherally.  The  conflicting  conclusions 
may  be  reconciled  with  two  assumptions:  (1)  The  Jamieson  and  Cheesman  study 
underestimates  a  central  component  for  /b/;  (2)  The  present  study  underestimates  a 
peripheral  component  for  /b/. 

Jamieson  and  Cheesman  tested  both  /ba/-/pa/  and  /da/-/ta/  continua,  and  found 
small  but  consistent  effects  with  contralateral  adaptation  with  the  voiced  tokens. 
Due  to  the  details  of  their  testing  procedure,  the  data  presented  for  /ba/-/pa/ 
cannot  accurately  be  compared  to  the  results  of  the  present  study.  The  /aa/-/ta/ 
results  are  broken  down  in  a  way  that  makes  this  comparison  possible,  and  these 
results  include  a  reasonably  robust,  if  not  overwhelming,  effect  for  contralateral 
/da/.  Overall,  the  data  suggest  more  of  a  central  component  than  a  simple  gloss 
might  suggest. 

An  important  difference  between  the  Jamieson  and  Cheesman  study  and  the 
present  one  lies  in  the  details  of  the  speech  synthesis.  The  /b/  in  their  study 
bad  40  ms  formant  transitions,  whereas  the  /b/  endpoint  here  had  20  ms  transitions. 
These  short  transitions  were  used  because  transition  duration  and  rise  time  are  not 
independent  in  the  Klatt  (1900b)  synthesizer,  and  a  short  rise  time  was  needed  in 
order  to  provide  a  test  of  the  pluck  adaptor.  It  is  possible  that  the  brevity  of 
the  transitions  minimized  the  contribution  of  peripheral-level  adaptation.  If  such 
adaptation  depends  on  some  function  of  stimulus  energy,  the  short  transitions  would 
be  relatively  ineffective.  This  would  lead  to  an  underestimate  of  simple  acoustic 
effects  for  /ba/.  It  could  also  contribute  to  the  instability  of  adaptation 
effects  found  with  /b/  in  Experiment  2. 
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The  possible  role  of  stiinulus  energy  in  determining  the  size  of 
peripherally-based  adaptation  has  not  been  studied  extensively,  and  deserves 
further  study.  This  factor  might  help  to  account  for  variations  in  adaptor 
efficacy  across  voiced  and  voiceless  adaptors  (see  Eimas,  Cooper,  and  Corbit,  1973; 
Jamieson  and  Cheesman,  1986;  Kat  and  Samuel,  1984). 

The  present  study  has  used  a  combination  of  techniques  to  investigate  the 
levels  of  representation  of  voiced  and  whispered  stops  and  continuants.  By 
combining  nonspeech  adaptors,  variation  of  test  continuum  and  adaptor  periodicity, 
arid  laterality  of  adaptation,  the  two  experiments  have  developed  a  cohesive 
description  of  two  levels  of  representation.  These  results  converge  with  a  growing 
body  of  evidence  supporting  a  distinction  between  simple  acoustic  and  complex 
acoustic  representations.  In  addition,  the  results  provide  data  on  the 
commonalities  of  whispered  and  normal  speech.  These  commonalities  arise  at  the 
complex  acoustic  level,  following  different  (i.e,  periodicity-specific)  processing 
at  the  simple  acoustic  level.  The  techniques  employee  here  appear  to  provide 
powerful  tools  for  investigating  the  various  representations  of  the  speech  signal 
as  it  goes  from  a  vibration  pattern  on  the  basilar  membrane  to  a  linguistic 
structure. 


II.  A  direct  test  of  the  syllable's  perceptual  role 


In  many  models  of  speech  perception,  a  sublexical  level  of  processing  beyond 
the  two  explored  in  the  whisperer:  speech  research  has  been  proposed  —  the 
syllable.  Several  lines  of  research  have  supported  the  syllable's  role  in  the 
chain  of  processing.  First,  .“assc.ro  (1972)  presented  listeners- with  speech  stimuli 
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followed  by  a  nasking  sound,  and  found  that  recognition  of  the  stimulus  reached  an 
asynptote  with  about  250  msec  of  processing.  This  time  interval  is  of 
approximately  syllabic  size.  Second,  Savin  and  Bever  (1970)  found  that  subjects 
were  faster  to  report  the  occurrence  of  a  syllable  target  (e.g.,  "pon")  than  the 
occurrence  of  a  phonemic  one  (e.a.,  "p")  (but  see  Foss  and  S-zinney,  1973).  Third, 
in  a  developmental  study,  Liberman,  Shankweiler,  Fischer,  and  Carter  (1974)  found 
that  children  demonstrated  metalinguistic  access  to  the  syllabic  level  before  the 
phonemic  one.  Fourth,  in  the  selective  adaptation  paradigm,  Ades  (1974)  showed 
that  adaptation  effects  do  not  occur  if  the  adaptor  and  test  items  are  mismatched 
in  syllabic  position  (initial  versus  final);  Samuel,  Kat,  and  Tartter  (1984)  have 
extended  this  posit ion- specificity  to  intervocalic  consonants. 

Probably  the  most  widely  cited  study  supporting  the  role  of  the  syllable  in 
speech  perception  is  one  by  Huggins  (1964) .  Huggins  presented  listeners  with 
speech  that  alternated  between  the  ears,  and  found  that  an  ear-alternation  rate  of 
approximately  four  st/itches  per  second  was  maximally  disruptive,  (tote  that  this 
rate  corresponds  to  the  time  course  found  by  Hassaro,  and  would  on  average 
interrupt  each  syllable  once.  Interestingly,  when  Huggins  increased  the  speech 
rate,  the  maximally  disruptive  s-zitching  rate  increased  correspondingly,  suggestin'' 
once  again  that  interrupting  each  syllable  disrupts  comprehension.  The  only 
problem  in  interpreting  this  otherwise  excellent  study  is  that  the  inference  of  a 
syllabic  role  is  indirect:  The  switching  was  done  at  a  regular  rate,  with  no 
correlation  to  what  the  syllabic  pattern  actually  was. 

A  more  direct  test  of  the  syllable1 s  role  in  this  phenomenon  was  reported  by 
Huggins  (1967).  This  test  invoiced  presenting  a  passage  with  various  ear-switchinc 
conditions  that  were  directly  r-'lircd  to  the  syllabic  structure  of  the  passage.  In 
particular,  Huggins  compared  :xr.;r~,unee  when  all  syllables  were  disrupted  to 
performance  when  none  were  disrupted,  with  alternation  rate  matched.  This  test 
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actually  found  no  effect  of  syllabic  structure.  Perhaps  because  this  study  was  not 
as  comprehensive  and  impressive  as  the  Huggins  ,1964)  study,  this  failure  to 
support  the  syllable's  role  has  disappeared  from  the  literature. 

The  series  of  experiments  we  have  conducted  had  three  goals.  First,  we 
wished  to  replicate  the  basic  effect  of  Huggins  (1964) .  Second,  we  wished  to  test 
the  syllable's  role  directly,  using  a  set  of  materials  substantial  enough  to  De 
convincing  whichever  way  the  results  turned  out.  Finally,  assuming  the  syllable 
does  not  play  a  critical  role,  despite  its  wide  citation,  we  decided  to  test 
whether  the  ear  disruption  effect  is  speech-specific,  or  is  instead  a  general 
property  of  the  perception  of  complex  auditory  signals. 

The  results  of  our  work  in  this  domain  can  be  sumrar ized  succintly.  First, 
the  basic  ear-alternation  effect  is  reasonably  robust:  We  had  to  degrade  the 
passages  slightly  (by  adding  white  noise)  to  bring  our  subjects'  performance  off 
the  ceiling,  but  once  that  was  done,  we  found  a  pattern  over  various  alternation 
rates  that  was  consistent  with  Huggins  (1964) ;  we  also  replicated  the  effect  of 
playback  rate. 

Our  direct  test  of  the  syllable's  role,  like  that  of  Huggins  (1967) ,  involved 
presenting  passages  with  equal  average  alternation  rates,  but  with  differing 
syllabic  disruption  (all,  none,  or  random).  Unlike  Huggins  (1967),  we  used  a  wide 
set  of  naterials  (twelve  passages  versus  one) ,  and  two  levels  of  overall 
difficulty.  Simply  put,  there  was  absolutely  no  hint  of  a  syllabic  role, 
confirming  Huggins  (1967) ,  and  disconf inning  the  widely-cited  conclusion  of  Huggins 
(1964) . 

Given  the  irrelevance  of  the  syllable,  we  decided  to  test  whether  similar 
disruption  occurs  for  other  complex  acoustic  signals.  We  have  run  one  large  study 
testing  recognition  of  familiar  melodies.  The  data  indicate  that  there  seems  to  be 
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a  similar  disruption  of  perception  at  about  the  same  alternation  rate  as  with 
speech.  However ,  there  was  no  corresponding  effect  of  playback  rate.  We  plan  one 
more  experiment,  using  a  more  sensitive  technique  to  determine  how  similar  the 
speech  and  music  domains  are  with  respect  to  the  ear-alternation  effect. 

III.  Perception  of  timbre:  similar  to  speech? 

In  the  two-level  model  explored  in  the  whispered  speech  work,  a  critical 
result  was  the  success  of  the  nonspeech  "pluck"  in  shifting  listeners' 
identification  of  speech  syllables.  This  result  strongly  reinforces  Samuel  and 
Newport's  (1979)  claim  that  the  second,  centred,  level  of  representation  is  best 
characterized  as  "complex  acoustic",  rather  than  "phonetic",  since  a  phonetic  level 
must  by  definition  be  speech  specific.  An  implication  of  this  analysis  is  that 
sounds  in  additon  to  speech  should  be  represented  at  this  level;  recognition  of 
complex  acoustic  patterns  in  general  should  be  mediated  by  these  representations. 

If  this  is  so,  then  we  might  expect  to  find  nonspeech  sounds  that  produce 
patterns  of  performance  similar  to  those  found  for  speech.  In  fact,  a  number  of 
investigators  have  reported  speech-like  results  for  nonspeech  stimuli  in  domains 
such  as  categorical  perception  (e.g.  Burns  and  Ward,  1978;  Miller,  Wier,  Pastore, 
Kelly,  and  Dooling,  1976) .  The  experiments  conducted  in  the  present  line  of 
research  were  intended  to  provide  a  principled  set  of  studies  that  examine  a 
nonspeech  domain  across  a  range  of  phenomena  typically  studied  for  speech. 

A  nonspeech  domain  that  seems  to  share  many  of  the  structural  properties  of 
speech  is  tijrbre.  In  particular,  timbre  is  a  multidimensional  domain,  and  the 
families  of  instruments  differentiated  by  timbre  can  be  compared  to  the  families  of 
phonemes.  Thus,  just  as  there  are  high  vowels  and  low  vowels,  or  stops  and 
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fricatives,  there  are  horns  and  strings,  or  percussion  and  woodwind.  The  research 
summarized  in  this  section  involved  synthesizing  a  "horn"-" string"  (actually 
"trumpet"-"cello")  continuum,  and  using  this  continuum  in  various  paradigms  that 
have  been  extensively  used  in  speech  research.  The  goal  is  to  see  whether  the 
patterns  found  for  speech  are  found  for  stimuli  varying  in  timbre. 

I7e  have  collected  data  on  the  timbre  continuum  in  four  speech  paradigms,  and 
the  results  can  be  summarized  as  follows: 

(1)  .  Categorical  perception:  This  phenomenon  has  been  a  hallmark  of 
speech  research.  It  is  defined  by  the  relationship  between  identification  of 
stimuli,  and  their  discrimination.  In  theory,  categorical  perception  is  present 
when  subjects  can  discriminate  stimuli  only  as  well  as  they  can  identify  them; 
discrimination  can  be  predicted  from  identification.  In  practice,  fully 
categorical  perception  is  rarely,  if  ever,  observed.  Ha/ever,  some  stimuli, 
notably  stop  consonants,  show  rather  categorical  results.  Other  consonants,  and 
vowels,  show  moderate  levels  of  categorical  perception. 

Our  results  for  the  timbre  continuum  are  comparable  to  the  moderate  levels  of 
categorical  perception  found  with  many  speech  sounds.  Subjects  produced  clean 
identification  functions  ("horn"  versus  "string") ,  and  the  (ABX)  discrimination 
functions  had  peaks  at  the  category  boundaries,  with  troughs  within  categories.  The 
discrimination  function  predicted  on  the  basis  of  identification  paralleled  the 
observed  discrimination  very  nicely,  but  actual  discrimination  systematically 
exceeded  the  predicted.  This  indicates  that  subjects  were  using  both  categorical 
and  noncategorical  information. 

(2) .  Selective  adaptation:  in  our  first  annual  reprot,  we  reviewed  much 
of  the  literature  and  controversy  regarding  selective  adaptation,  and  put  forth  a 
strong  case  for  its  utility.  The  results  of  this  technique  with  the  timbre  stimuli 
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were  quite  consistent  with  those  for  speech:  Repeated  presentation  of  the  horn 
reduced  horn  report,  and  repeated  presentation  of  the  string  reduced  string  report. 

(3)  .  Paired  contrast;  The  annual  report  also  reviewed  the  paired  contrast 
paradigm,  and  its  relationship  to  selective  adaptation.  It  was  argued  there  that 
despite  claims  to  the  contrary  (e.g.,  Diehl,  1981) ,  the  two  paradigms  are 
dissociable.  The  results  for  the  timbre  stimuli  support  this  contention:  Despite 
the  reliable  adaptation  effects  just  reported,  no  shifts  in  labeling  were  found  in 
the  paired  contrast  paradigm.  A  plausible,  but  as  yet  untested  inference  is  that 
the  observed  adaptation  effects  in  this  domain  are  occurring  at  the  simple  acoustic 
level  of  representation  (see  Samuel,  1986  for  a  discussion  of  this  analysis). 

(4)  Duplex  perception:  In  "duplex  perception"  experiments  (Eland,  1974) , 
synthetic  speech  syllables  are  broken  into  two  pieces,  and  the  pieces  are  presented 
dichotically.  One  piece  includes  the  second  anchor  third  formant  transition (s) , 
and  the  other  includes  the  rest  of  the  syllable.  Under  these  conditions,  listeners 
report  hearing  the  appropriate  full  syllable  in  the  ear  with  the  bulk  of  the 
syllable,  and  a  nonspeech  chirp  in  the  other  (e.g.,  Rand,  1974;  Liberman,  Isenberg, 
and  Rakerd,  1980) .  This  "duplex  percept"  of  speech  and  nonspeech  simultaneously 
has  led  a  number  of  researchers  to  suggest  that  an  "auditory"  mode  and  a  "speech 
mode"  of  perception  are  involved  (Liberman,  Isenberg,  and  Rakerd,  1980;  Repp, 
Milburn,  and  Ashkenas,  1983). 

The  timbre  stimuli  provide  a  test  of  the  position  advanced  by  these 
theorists.  If  such  stimuli  can  produce  duplex  perception,  then  an  alternative 
model  that  does  not  invoke  a  "speech  mode"  is  needed.  At  this  point,  we  have 
generated  a  number  of  synthesis  versions,  and  collected  data  on  their  perception. 
This  work  is  still  in  progress.  The  data  in  hand  are  mixed  —  the  timbre  stimuli 
appear  to  produce  the  duplex  percept,  but  the  synthesis  versions  to  date  have  not 
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provided  a  very  robust  effect.  We  plan  to  pursue  alternative  synthesis  versions, 
to  see  whether  these  nonspeech  stimuli  can  produce  a  robust  duplex  effect. 


J3£.  Music  Restoration 

A  final  line  of  research  to  be  reported  is  an  extension  to  the  domain  of 
irusic  of  the  phonemic  restoration  effect  (Warren,  1970) .  Warren  replaced  part  of 
an  utterance  with  a  cough,  and  found  that  listeners  could  not  detect  the 
replacement;  they  appeared  to  have  restored  the  missing  speech.  Samuel  (1981) 
introduced  a  methodology  for  studying  the  illusion  that  has  been  used  in  the  music 
restoration  work,  stimulus  items  are  constructed  in  pairs:  a  replacement  item  is 
comparable  to  Warren's  stimuli  —  a  portion  of  the  waveform  is  replaced  with  an 
extraneous  sound  (white  noise) .  An  added  item  is  constructed  by  adding  the  white 
noise  to  the  same  portion  of  the  waveform  that  is  replaced  in  the  matching  item.  To 
the  extent  that  listeners  are  perceptually  restoring  the  missing  sound  in 
replacement  items,  they  should  sound  like  added  items  (intact  with  an  extraneous 
noise) .  By  using  signal  detection  analyses,  a  bias-free  measure  of  how  nuch 
replacement  items  sound  like  intact  ones  is  computed  (d') ,  and  is  the  measure  of 
the  perceptual  strength  of  the  effect;  a  bias  parameter  (Beta)  is  also  computed 
that  reflects  postperceptual  bias  toward  calling  a  stimulus  intact. 

The  work  on  music  restoration  is  the  basis  of  a  dissertation  by  Lucinda 
DCWitt.  The  only  prior  work  on  this  phenomenon  is  essentially  a  pilot  study  by 
Sasaki  (1980)  that  suggested  that  music  might  produce  restoration.  The  goals  of 
the  research  are  to  establish  whether  music  restoration  really  works,  and  if  so,  to 
explore  what  factors  affect  the  illusion.  In  addition,  we  sure  interested  in  the 
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relationship  of  music  restoration  and  phonemic  restoration  :  Are  similar  patterns 
found,  and  is  there  reason  to  posit  a  unitary  mechanism  for  speech  and  music 
restoration? 

We  have  run  a  half  dozen  experiments  in  this  domain,  and  several  more  are 
planned.  The  first  experiment  was  a  comparison  of  restoration  in  words,  in 
melodies,  and  in  a  control  condition.  The  word  items  were  standard  phonemic 
restoration  stimuli  like  those  used  by  Samuel  (1981).  The  melodies  were  simple 
familiar  tunes  played  on  a  piano.  One  note  in  each  melody  was  either  replaced  by 
noise,  or  had  noise  added  to  it.  The  control  condition  was  the  added  or  replaced 
version  of  the  critical  note  from  each  melody;  subjects  judged  whether  the  item  was 
a  note  plus  noise,  or  just  noise.  The  results  indicated  that  (1)  music  restoration 
is  real  -  there  was  more  perceptual  restoration  in  melodies  than  in  control  notes; 
and  (2)  phonemic  restoration  is  a  stronger  effect  them  music  restoration,  at  least 
under  these  testing  conditions. 

Several  follcwup  experiments  have  explored  the  role  of  listener  knowledge  / 
note  predictability,  using  manipulations  such  as  melody  familiarity  and  priming. 
These  experiments  generally  showed  better  discr iminability  of  added  from  replaced 
stimuli  with  increasing  listener  knowledge,  a  result  that  is  contrary  to  that  found 
with  words,  but  consistent  with  the  results  for  sentential  predictability  (Samuel, 
1981) .  Our  latest  experiments  have  looked  for  an  analog  to  the  lexical  effects 
found  with  speech.  We  have  recently  tested  scales,  rather  than  melodies,  and  these 
stimuli  appear  to  behave  in  the  desired  fashion.  In  particular,  giving  listeners  a 
longer  stretch  of  a  scale  (leading  up  to  a  note)  seems  to  boost  expectation  of  that 
note  sufficiently  to  induce  stronger  perceptual  restoration.  We  are  currently 
working  on  chord  stimuli  to  see  if  they  exhibit  similar  word-like  behavior. 

O/erall,  the  work  with  music  is  providing  interesting  comparisons  of  the  role  of 
expectation  in  the  perception  of  complex  acoustic  patterns. 
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Table  1 


Adaptor 

Voiced  Series 

Whispered  Series 

voiced  /ba/ 

-14.2* 

-8.7* 

voiced  /wa/ 

+29.6* 

+25.9* 

whispered  /ba/ 

-4.3 

-19.1* 

whispered  /wa/ 

+33.2* 

+39.5* 

pluck 

-3.0 

-10.9* 

bow 

+1.6 

-0.5 

abrupt 

-0.2 

-5.6 

gradual 

+4.3 

+4.4 

Note:  Values  shown  are  the  percentage  changes  in  labeling  the 
aiddle  four  itess  fo  the  eight-iten  continue.  Means  narked  with 
an  asterisk  reflect  shifts  significant  at  the  .05  level  or 
beyond  (critical  value  for  t(10)  *  2.228) 
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TABLE  2 


Ipsilateral 


Contralateral 


Voiced  Series  Whispered  Series  Voiced  Series  Whispered  Series 

Adaptor 


pluck 

-5.0 

-8.2 

-10.1 

7.6 

whispered  /wa/ 

+25.2 

+33.9 

+17.1 

+13.3 

voiced  /wa / 

+39.9 

+29.8 

+15.8 

+17.7 

whispered  /ba/ 

-0.5 

-13.5 

-16.0 

-12.8 

voiced  /ba/  I 

-0.7 

0.9 

-17.0 

-9.6 

voiced  /ba/  IX 

-21.6 

-12.2 

-13.7 

-16.2 

Note:  Values  shown  are  the  percentage  changes  in  labeling  the  middle 
four  items  of  the  eight-item  continuum. 

The  "voiced  /ba/  I"  data  are  from  the  original  group  of  subjects,  and 
the  "voiced  /ba/II"  data  are  from  the  additional  subjects  (see  text). 


TABLE  3 


3* 


STOP  ADAPTOR  CONTINUANT  ADAPTOR 


Ipsilateral 
or  Binaural 

Contralateral 

Ipsilateral 
or  Binaural 

Contralateral 

l?| 

Same  series 

-17.1 

-13.3 

+35.7 

+14.5. 

20.2 

-  6.4 

-16.1 

+28.5 

+17.4 

17.1 

I 

-11.8 

-14.7 

+32.1 

+16.0 

Note:  Values  shown  are  the  percentage  changes  in  labeling  the  Biddle  four  iteas  of  the  eight-item 
continue.  Contralateral  data  coae  for  Experiment  2,  and  Binaural/Ipsilateral  data  coae  from 
Experiments  1  and  2.  The  voiced  /ha/  data  from  the  additonal  group  of  12  subjects,  in 
Experiment  2  were  used  for  this  Table,  rather  than  the  apparently  aberrent  results. 
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Figure  Captions 

Figure  1:  Indent if icat ion  of  the  test  syllables  (percentage  "BA")  before 

and  after  adaptation  with  voiced  /ba/  (top  two  panels)  and  voiced 
/wa/  (bottom  two  panels) . 

2:  Indent if ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  whispered  /ba/  (top  two  panels)  and 
whispered  /wa/  (bottom  two  panels) . 

3:  Indent if ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  pluck  (top  two  panels)  and  bow 
(bottom  two  panels) . 

4:  Indent if ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  abrupt  (top  two  panels)  and  <yadua.| 
(bottom  two  panels) . 

5:  Indentif ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  pluck,  with  ipsilateral  presentation  of 
adaptor  and  test  items  (top  two  panels) ,  or  contralateral 
presentation  (bottom  two  panels) . 

6:  Indentif ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  whispered  /wa/,  with  ipsilateral 
presentation  of  adaptor  and  test  items  (top  two  panels) ,  or 
contralateral  presentation  (bottom  two  panels) . 
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7 :  Indentif ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  voiced  /wa/,  with  ipsilateral 
presentation  of  adaptor  and  test  items  (top  two  panels) ,  or 
contralateral  presentation  (bottom  two  panels) . 

8:  Indentif ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  whispered  /ba/,  with  ipsilateral 
presentation  of  adaptor  and  test  items  (top  two  panels) ,  or 
contralateral  presentation  (bottom  two  panels) . 

9:  Indentif ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  voiced  /ba/,  with  ipsilateral 
presentation  of  adaptor  and  test  items  (top  two  panels) ,  or 
contralateral  presentation  (bottom  two  panels) . 

10:  Indentif ication  of  the  test  syllables  (percentage  "BA")  before 
and  after  adaptation  with  voiced  /ba/,  with  ipsilateral 
presentation  of  adaptor  and  test  items  (top  two  panels) ,  or 
contralateral  presentation  (bottom  two  panels) ,  with  a  new  group 
of  subjects. 
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